Skip to content

Conversation

@gabeiglio
Copy link
Contributor

@gabeiglio gabeiglio commented Sep 2, 2025

Closes #2166

Rationale for this change

We should block when an user wants to drop a column if that column is being referenced by either a active partition spec or sort order field.

Are these changes tested?

Yes, I added unit tests for every incompatible schema change in partitions and sort orders. Also added two new integration tests in test_catalog to test for this scenario

Are there any user-facing changes?

No

@gabeiglio gabeiglio force-pushed the partition-spec-validation branch from d4928bd to 81732d4 Compare September 2, 2025 07:05
@gabeiglio gabeiglio marked this pull request as ready for review September 2, 2025 07:06
Copy link
Contributor

@Anton-Tarazi Anton-Tarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, left a minor comment :)

@gabeiglio
Copy link
Contributor Author

@Anton-Tarazi thanks for the review (this PR went off my radar) I have rebased and applied recommended changes

@gabeiglio
Copy link
Contributor Author

cc: @Fokko @geruh This is somewhat adjacent to the recent PR especially this comment

Copy link
Contributor

@geruh geruh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some suggestions to get this moving for the 0.11 release. These fix the two bugs inverted allow_missing_fields logic, and string interpolation, also, we simplify the code by using existing schema._lazy_id_to_parent

@gabeiglio gabeiglio force-pushed the partition-spec-validation branch from 7e47009 to 0a7a39e Compare January 26, 2026 11:47
@gabeiglio
Copy link
Contributor Author

Thanks for the review @geruh applied the comments

Copy link
Contributor

@geruh geruh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gabeiglio this LGTM!!

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks for the pr

couple of nits but not a blocker

path = "/".join([field_str + "=" + value_str for field_str, value_str in zip(field_strs, value_strs, strict=True)])
return path

def check_compatible(self, schema: Schema, allow_missing_fields: bool = False) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: doesnt look like allow_missing_fields is used anywhere, should we still keep it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_compatible in Java its only used with allowMissingField=true (ctx) when reading metadata tables. Since here we only use this check (for now) at write path only, I think we could remove this field.

@gabeiglio gabeiglio force-pushed the partition-spec-validation branch from 5d62d54 to 2a65eb7 Compare January 27, 2026 14:42
@gabeiglio gabeiglio force-pushed the partition-spec-validation branch from 2a65eb7 to e12f66e Compare January 27, 2026 15:14
Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fixes! i think we're almost there

while parent_id:
parent_type = schema.find_type(parent_id)
if not parent_type.is_struct:
raise ValidationError(f"Invalid partition field parent: {parent_type}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a bit more info here too

@kevinjqliu kevinjqliu merged commit fe6b6fc into apache:main Jan 27, 2026
11 checks passed
@kevinjqliu
Copy link
Contributor

Thanks for the PR @gabeiglio and thanks for the review @jayceslesar @Anton-Tarazi @geruh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pyiceberg allows dropping the sort order column and causes table corruption on AWS Glue Catalog

5 participants